摘要 :
Trillions of bytes of data are generated every day in different forms, and extracting useful information from that massive amount of data is the study of data mining. Sequential pattern mining is a major branch of data mining that...
展开
Trillions of bytes of data are generated every day in different forms, and extracting useful information from that massive amount of data is the study of data mining. Sequential pattern mining is a major branch of data mining that deals with mining frequent sequential patterns from sequence databases. Due to items having different importance in real-life scenarios, they cannot be treated uniformly. With today's datasets, the use of weights in sequential pattern mining is much more feasible. In most cases, as in real-life datasets, pushing weights will give a better understanding of the dataset, as it will also measure the importance of an item inside a pattern rather than treating all the items equally. Many techniques have been introduced to mine weighted sequential patterns, but typically these algorithms generate a massive number of candidate patterns and take a long time to execute. This work aims to introduce a new pruning technique and a complete framework that takes much less time and generates a small number of candidate sequences without compromising with completeness. Performance evaluation on real-life datasets shows that our proposed approach can mine weighted patterns substantially faster than other existing approaches.
收起
摘要 :
Sequences are one of the most important types of data. Recently, mining and analysis of sequence data has been studied in several fields. In a DNA sequences may exist other characters then not exist in alphabet. It is related to a...
展开
Sequences are one of the most important types of data. Recently, mining and analysis of sequence data has been studied in several fields. In a DNA sequences may exist other characters then not exist in alphabet. It is related to a function of the DNA that has been preserved in the evolutionary process of an organism. Discovery of DNA sequences dataset that contains gap is very hard job for algorithms. We present an algorithm that discovery this DNA sequences in datasets. Our algorithm used sequential pattern mining method for in problems.
收起
摘要 :
Frequent pattern mining has become very useful and interesting to researchers due to its high applicability. Different real-life databases (e.g., sensor network, medical diagnosis data) are uncertain in their nature. Many algorith...
展开
Frequent pattern mining has become very useful and interesting to researchers due to its high applicability. Different real-life databases (e.g., sensor network, medical diagnosis data) are uncertain in their nature. Many algorithms have been developed to mine the frequent uncertain patterns based on expected support values. Nonetheless, those are circumscribed to find the frequent patterns by using some filtering constraints. Moreover, it is challenging to find the actual interesting patterns as different patterns carry different importance. In this work, a new framework is proposed to mine sequences in uncertain databases satisfying both weight and support constraints. Subsequently, an efficient algorithm (uWSequence) is developed to discover the uncertain weighted sequences. In addition, the pruning measures iMaxPr, and expSupport(top) play a vital role to make uWSequence remarkably time-efficient, by filtering out the unfavorable patterns in early stages. The applicability of this proposed framework is shown to solve various problems (e.g., weather forecasting, sensor-based event findings). To our knowledge, ours is the first work on weighted sequences in uncertain databases. Extensive performance analysis confirms the efficiency of the proposed algorithm as well as the superiority over the existing algorithms. (C) 2018 Elsevier Inc. All rights reserved.
收起
摘要 :
From the beginning of sequential pattern mining to the present, this field has received important attention within the data mining area, because it has a wide application in several significant computational problems. Many algorit...
展开
From the beginning of sequential pattern mining to the present, this field has received important attention within the data mining area, because it has a wide application in several significant computational problems. Many algorithms have been created and several techniques have been used with the objective of improving the discovery of the frequent sequence set. In this paper we present the main characteristics of some of the most important sequential pattern mining algorithms. Also, we show a comparative performance study among these algorithms.
收起
摘要 :
Sequential pattern mining in data streams environment is an interesting data mining problem. The problem of finding sequential patterns in static databases had been studied extensively in the past years, however mining sequential ...
展开
Sequential pattern mining in data streams environment is an interesting data mining problem. The problem of finding sequential patterns in static databases had been studied extensively in the past years, however mining sequential patterns in the data streams still an active field for researches. In this research a new greedy sequence pattern mining algorithm for the data streams is introduced, it will be used to find the strongly supported sequences. The proposed algorithm is built based on the sequence tree which is used to find the sequential patterns in static databases. The proposed algorithm divides the streams into patches or windows and each patch will update the sequence tree which built from the previous windows. An example is introduced to explain how this algorithm works. We also show the efficiency and the effectiveness of the proposed algorithm on a synthetic dataset and prove how it is suited for data streams environment. We showed experimentally that the proposed algorithm is more efficient than the PrefixSpan algorithm for patterns with any support less than 30% for CPU time and with any support less than 60% for memory usage.
收起
摘要 :
A large event sequence can generate episode rules that are patterns which help to identify the possible dependencies existing among event types. Frequent episodes occurring in a simple sequence of events are commonly used for mini...
展开
A large event sequence can generate episode rules that are patterns which help to identify the possible dependencies existing among event types. Frequent episodes occurring in a simple sequence of events are commonly used for mining the episodes from a sequential database. Mining serial positioning episode rules (MSPER) using a fixed-gap episode occurrence suffers from unsatisfied scalability with complex sequences to test whether an episode occurs in a sequence. Large number of redundant nodes was generated in the MSPER-trie-based data structure. In this paper, forward and backward search algorithm (FBSA) is proposed here to detect minimal occurrences of frequent peak episodes. An extensive correlation of parameter settings and the generating procedure of fixed-gap episodes are carried out. To generate a fixed-gap episode and estimate the variance that decides the parameter selection in event sequences, Spearman's correlation coefficient is used for verifying the sequence of occurrences of the episodes. MFSPER with FBSA is developed to eh'minate the frequent sequence scans and redundant event sets. The MFSPER-FBSA stores the minimal occurrences of frequent peak episodes from the event sequences. The experimental evaluation on benchmark datasets shows that the proposed technique outperforms the existing methods with respect to memory, execution time, recall and precision.
收起
摘要 :
A method is proposed for selecting a rational mining sequence with internal dumping for flat stratified deposits, using new principles of the open-pit process-space formation and development. The main criteria for substantiating t...
展开
A method is proposed for selecting a rational mining sequence with internal dumping for flat stratified deposits, using new principles of the open-pit process-space formation and development. The main criteria for substantiating the mining sequence are geometrical form and development direction of the open-pit space, structure of the working wall and transportation network, internal dumping capacities and mining earthworks volumes.
收起
摘要 :
Unsupervised sequence learning is important to many applications. A learner is presented with unlabeled sequential data, and must discover sequential patterns that characterize the data. Popular approaches to such learning include...
展开
Unsupervised sequence learning is important to many applications. A learner is presented with unlabeled sequential data, and must discover sequential patterns that characterize the data. Popular approaches to such learning include (and often combine) frequency-based approaches and statistical analysis.
However, the quality of results is often far from satisfactory. Though most previous investigations seek to address method-specific limitations, we instead focus on general (method-neutral) limitations in current approaches. This paper takes two key steps towards addressing such general quality-reducing flaws. First, we carry out an in-depth empirical comparison and analysis of popular sequence learning methods in terms of the quality of information produced, for several synthetic and real-world datasets, under controlled settings of noise. We find that both frequency-based and statistics-based approaches (ⅰ) suffer from common statistical biases based on the length of the sequences considered; (ⅱ) are unable to correctly generalize the patterns discovered, thus flooding the results with multiple instances (with slight variations) of the same pattern. We additionally show empirically that the relative quality of different approaches changes based on the noise present in the data: Statistical approaches do better at high levels of noise, while frequency-based approaches do better at low levels of noise. As our second contribution, we develop methods for countering these common deficiencies. We show how to normalize rankings of candidate patterns such that the relative ranking of different-length patterns can be compared. We additionally show the use of clustering, based on sequence similarity, to group together instances of the same general pattern, and choose the most general pattern that covers all of these. The results show significant improvements in the quality of results in all methods, and across all noise settings.
收起
摘要 :
Spatiotemporal event sequences (STESs) are the ordered series of event types whose instances frequently follow each other in time and are located close-by. An STES is a spatiotemporal frequent pattern type, which is discovered fro...
展开
Spatiotemporal event sequences (STESs) are the ordered series of event types whose instances frequently follow each other in time and are located close-by. An STES is a spatiotemporal frequent pattern type, which is discovered from moving region objects whose polygon-based locations continiously evolve over time. Previous studies on STES mining require significance and prevalence thresholds for the discovery, which is usually unknown to domain experts. The quality of the discovered sequences is of great importance to the domain experts who use these algorithms. We introduce a novel algorithm to find the most relevant STESs without threshold values. We tested the relevance and performance of our threshold-free algorithm with a case study on solar event metadata, and compared the results with the previous STES mining algorithms.
收起
摘要 :
Rare events analysis is an area that includes methods for the detection and prediction of events, e.g. a network intrusion or an engine failure, that occur infrequently and have some impact to the system. There are various methods...
展开
Rare events analysis is an area that includes methods for the detection and prediction of events, e.g. a network intrusion or an engine failure, that occur infrequently and have some impact to the system. There are various methods from the areas of statistics and data mining for that purpose. In this article we propose PREVENT, an algorithm which uses inter-transactional patterns for the prediction of rare events in transaction databases. PREVENT is a general purpose inter-transaction association rules mining algorithm that optimally fits the demands of rare event prediction. It requires only 1 scan on the original database and 2 over the transformed, which is considerably smaller and it is complete as it does not miss any patterns. We provide the mathematical formulation of the problem and experimental results that show PREVENT's efficiency in terms of run time and effectiveness in terms of sensitivity and specificity.
收起